37 research outputs found
The Influence of layout on the interpretation of referring expressions
From the introduction: The division of text into visual segments such as sentences, paragraphs and sections achieves many functions, such as easing navigation, achieving pragmatic effect, improving readability and
reflecting the organisation of information (Wright, 1983; Schriver 1997). In this paper, we report a small experiment that investigates the effect of different layout configurations on the interpretation of the antecedent of anaphoric referring expressions. Layout has so far played little role in Natural Language Generation (NLG) systems. The layout of output texts is generally very simple. At worst, it consists of only a single paragraph consisting of a few sentences; at best it is predetermined by schemas (Coch, 1996; Porter and Lester, 1997) or discourse plans (Milosavljevic, 1999). However, recent work by Power (2000) and Bouayad et al. (2000) has integrated graphically signalled segments (e.g., by whitespace, punctuation, font and face alternation) such as paragraphs, lists, text-sentences and text-clauses in a hierarchical tree-like representation called the document structure.2 This work was carried out within the ICONOCLAST
project (Integrating CONstraints On Layout and Style), which aims at automatically generating formatted texts in which the formatting decisions affect the wording and vice-versa.3 If document structure affects the comprehensibility of referring expressions, this must be taken into account in any attempt to generate felicitous formatted texts. This will go a step further from
current research in the automatic generation of referring expressions, where only the effect of discourse structure and grammatical function has been investigated (Dale and Reiter, 1995; Cristea et al., 1998;Walker et al., 1998; Kibble and Power, 1999)
Discourse structuring of dynamic content
Uno de los desaf铆os de la Generaci贸n de Lenguaje Natural es la adaptaci贸n de
la estructura y las palabras de la salida ling眉铆stica a la habilidad del usuario, el contenido,
el g茅nero apropiado, el estilo, etc. Nos centramos en la determinaci贸n de la estructura
del discurso. En general, se supone que entre dos unidades de contenido ocurre siempre
la misma relaci贸n de discurso. Propuestas que var铆an el tipo de relaci贸n discursiva y el
orden de las proposiciones seg煤n la interpretaci贸n del contenido siguen siendo escasas. Sin
embargo, tal interpretaci贸n es extremadamente importante especialmente si el contenido es
altamente din谩mico como por ejemplo, cuando los datos son series temporales. Presentamos
un planificador de textos que considera las restricciones que imponen los datos din谩micos
para tomar decisiones a cada etapa de la planificaci贸n, en particular para la selecci贸n de las
relaciones discursivas y la ordenaci贸n de las proposiciones.One of Natural Language Generation鈥檚 continuing challenges is to determine the
structure and words of the generated linguistic output in accordance with the expertise of
the user, the content, the appropriate genre, style, etc. We focus on the determination of
the discourse structure. Most often, it is assumed that between two content units always
the same discourse relation holds. Approaches in which the choice of discourse relations
and the ordering of propositions depends on the interpretation of the content are still scarce.
However, such an interpretation is extremely important especially if the content is highly
dynamic as, e.g., in the case of data parameter time series. We present a text planner that
takes into account the constraints imposed by dynamic data to make decisions at every
stage of the text planning, and in particular, for the selection of discourse relations and the
ordering of propositions.The work reported on in this paper has been carried out in
the framework of the MARQUIS-project funded by the European
Commission in the framework of the eContent programme
under the contract number EDC-11258; duration:
2005-2007
Can text structure be incompatible with rhetorical structure?
Scott and Souza (1990) have posed the problem of how a rhetorical structure (in which propositions are linked by rhetorical relations, but not yet arranged in a linear order) can be realized by a text structure (in which propositions are ordered and linked up by appropriate discourse connectives) Almost all work on this problem assumes)implicitly or explicitly, that this mapping is governed by a constraint on compatibility of structure. We show how this constraint can be stated precisely, and present some counterexamples which seem acceptable even though they violate compatibility. The examples are based on a phenomenon we call extraposition, in which complex embedded constituents of a rhetorical structure are extracted and realized separately
FootbOWL: Using a generic ontology of football competition for planning match summaries
The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-642-21034-1_16Proceedings of 8th Extended Semantic Web Conference, ESWC 2011, Heraklion, Crete, Greece, May 29-June 2, 2011We present a two-layer OWL ontology-based Knowledge Base (KB) that allows for flexible content selection and discourse structuring in Natural Language text Generation (NLG) and discuss its use for these two tasks. The first layer of the ontology contains an application-independent base ontology. It models the domain and was not designed with NLG in mind. The second layer, which is added on top of the base ontology, models entities and events that can be inferred from the base ontology, including inferable logico-semantic relations between individuals. The nodes in the KB are weighted according to learnt models of content selection, such that a subset of them can be extracted. The extraction is done using templates that also consider semantic relations between the nodes and a simple user profile. The discourse structuring submodule maps the semantic relations to discourse relations and forms discourse units to then arrange them into a coherent discourse graph. The approach is illustrated and evaluated on a KB that models the First Spanish Football League
Duplication in Corpora
We investigate duplication, a pervasive problem in NLP corpora. We present a method for finding it that uses word frequency list comparisons and experiment with this method on different units of duplication. 1 Introduction Most corpora contain repeated material. In sampled corpora like the Brown Corpus, duplication is not so much of an issue, since the linguistic data is carefully selected proportionally by genre and thus the risk of introducing unwanted duplication is reduced. However, the typical corpus used in NLP is one in which as much data as possible of the desired genre is gathered. The result is a corpus whose nature and content is rather unknown. This issue has not, to our knowledge, been previously discussed in the literature. While we may expect the repeated occurrence of words or expressions to reflect their use in the language, the repetition of longer stretches of printed material (section-, paragraphor even sentence-length) most likely do not. Text processing technolog..
Layout Annotation in a Corpus of Patient Information Leaflets
We discuss the problems and issues that arised during the development of a procedure for annotating layout in a corpus of Patient Information Leaflets. We show how the genre of the corpus as well as the aim of the annotation influenced the annotation scheme. We also describe the automatic annotation procedure